Nearest Neighbor Search using Kd-trees
نویسنده
چکیده
We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor search problem suffers from the curse of dimensionality we focus on approximate solutions; a c-approximate nearest neighbor is any neighbor within distance at most c times the distance to the nearest neighbor. We show that for a randomly constructed database of points the traditional kd-tree search algorithm has a very low probability of finding an approximate nearest neighbor; the probability of success drops exponentially in the number of dimensions d as e. However, a simple change to the search algorithm results in a much higher chance of success. Instead of searching for the query point in the kd-tree the search for a random set of points in the neighborhood of the query point. It turns out that searching for e such points can find the c-approximate nearest neighbor with a much higher chance of success.
منابع مشابه
An Improved Algorithm Finding Nearest Neighbor Using Kd-trees
We suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than two. Since the exact nearest neighbor search problem suffers from the curse of dimension...
متن کاملWhich Space Partitioning Tree to Use for Search - Summary
Trees like binary-space-partitioning trees, kd-trees, principal axis trees and random projection trees are used to answer the question ”which tree to use for nearest-neighbor search?.” This paper deals with the influence of the vector quantization performance of the trees on the search performance and the margins of the partitions in these trees. Theoretical results show that both factors have ...
متن کاملRapidPolygonLookup: An R package for polygon lookup using kd trees
Coordinate level spatial data need to be frequently aggregated to higher geographical identities like census blocks, ZIP codes or police district boundaries for analysis. This process requires mapping each point in the given data set to an individual element of the desired geographical hierarchy. Unless efficient data structures are used, this can be a daunting task. The operation point.in.poly...
متن کاملRandomly Projected KD-Trees with Distance Metric Learning for Image Retrieval
Efficient nearest neighbor (NN) search techniques for highdimensional data are crucial to content-based image retrieval (CBIR). Traditional data structures (e.g., kd-tree) usually are only efficient for low dimensional data, but often perform no better than a simple exhaustive linear search when the number of dimensions is large enough. Recently, approximate NN search techniques have been propo...
متن کاملProbabilistic cost model for nearest neighbor search in image retrieval
We present a probabilistic cost model to analyze the performance of the kd-tree for nearest neighbor search in the context of content-based image retrieval. Our cost model measures the expected number of kd-tree nodes traversed during the search query. We show that our cost model has high correlations with both the observed number of traversed nodes and the runtime performance of search queries...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006